Semantic subspace learning for text classification using hybrid intelligent techniques
نویسندگان
چکیده
A vast data repository such as the web contains many broad domains of data which are quite distinct from each other e.g. medicine, education, sports and politics. Each of these domains constitutes a subspace of the data within which the documents are similar to each other but quite distinct from the documents in another subspace. The data within these domains is frequently further divided into many subcategories. In this paper we present a novel hybrid parallel architecture using different types of classifiers trained on different subspaces to improve text classification within these subspaces. The classifier to be used on a particular input and the relevant feature subset to be extracted is determined dynamically by using maximum significance values. We use the conditional significance vector representation which enhances the distinction between classes within the subspace. We further compare the performance of our hybrid architecture with that of a single classifier – full data space learning system and show that it outperforms the single classifier system by a large margin when tested with a variety of hybrid combinations on two different corpora. Our results show that subspace classification accuracy is boosted and learning time reduced significantly with this new hybrid architecture.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملTwo-level text classification using hybrid machine learning techniques
Subspace detection and processing is receivingmore attention nowadays as a method to speed up searchand reduce processing overload. Subspace Learningalgorithms try to detect low dimensional subspaces in thedata which minimize the intra-class separation whilemaximizing the inter-class separation. In this paper wepresent a novel technique using the maximum significance...
متن کاملAutomatic road crack detection and classification using image processing techniques, machine learning and integrated models in urban areas: A novel image binarization technique
The quality of the road pavement has always been one of the major concerns for governments around the world. Cracks in the asphalt are one of the most common road tensions that generally threaten the safety of roads and highways. In recent years, automated inspection methods such as image and video processing have been considered due to the high cost and error of manual metho...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Hybrid Intell. Syst.
دوره 8 شماره
صفحات -
تاریخ انتشار 2011